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Abstract 

We propose a Monte Carlo algorithm to promote Kennedy and Kuti's linear 
accept/reject algorithm which accommodates unbiased stochastic estimates of 
the probability to an exact one. This is achieved by adopting the Metropolis 
accept/reject steps for both the dynamical and noise configurations. We test 
it on the five state model and obtain desirable results even for the case with 
large noise. We also discuss its application to lattice QCD with stochastically 
estimated fermion determinants. 
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1 Introduction 



Usually Monte Carlo algorithms require exact evaluation of the probability ratios in 
the accept/reject step. However, there are problems in physics which involve extensive 
quantities such as the fermion determinant which require V 3 steps to compute ex- 
actly. Thus the usual Monte Carlo algorithms for a large volume are not numerically 
applicable to such problems directly. To address this problem, Kennedy and Kuti 
|]J proposed a Monte Carlo algorithm which admits stochastically estimated transi- 
tion probabilities as long as they are unbiased. This opens up the door to tackling 
problems when it is feasible to estimate the transition probabilities but intractable or 
impractical to calculate them exactly. 

The acceptance probability (denoted as P a from now on) in Kennedy-Kuti's linear 
algorithm is 

P (T r rn _ J A+ + A- (e AH ) , if /(C/0 > f(U 2 ) , m 



if /(C/i) < f(U 2 ) 

where A^ are tunable real parameters ranging from to 1, (e AH ) denotes an unbiased 
estimator of the transition probability e AH with AH = H{U\) — IKJJ2). U\ denotes 
the old configuration and U2 the new or proposed configuration. f(U) is an observable 
of the configuration U adopted for ordering between U\ and V%. Detailed balance can 
be proven to be satisfied 

But there is a drawback with this linear algorithm. The probability P a could lie 
outside the interval between and 1 since it is estimated stochastically. Once the 
probability bound is violated, detailed balance is lost and systematic bias will show 
up. It is hoped that if the bound violation occurs rarely (e.g. once every million 
updates), the systematic bias might be small enough so that the expectation values 
of various quantities can still be correct within statistical errors [|J. 

Within the framework of the linear algorithm, there are at least three ways to 
reduce the probability of bound violation. 

1. The choice of A + = 0.0 , A - = 1/2 in ref. can be generalized to 

A + = 0.0 , A- = -L_ . ( 2 ) 
1 + a 

With a > 1, the allowed range of (e AH ) is proportionally increased so that the 
probability of upper bound violation can be tamed, albeit at the expense of a 
lower acceptance rate. 

2. One can choose a better ordering criterion to reduce the bound violation. When 
the ordering criterion is not correlated with AH, the upper bound is violated 
more frequently than the case in which the AH itself is used as the ordering 
criterion (i.e. AH > or < 0). However, one cannot calculate AH exactly - a 
premise for the problem; the best one can do is to estimate AH stochastically 
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without bias. As long as it can be made reasonably close to the true value of 
AH, it can be used as the ordering criterion. This should greatly reduce the 
probability of upper-bound violation. 

3. Usually it is AH that can be estimated without bias. Simple exponentiation 
of this estimator, i.e. e^ Ai ^ inevitably yields a bias. However, it is demon- 
strated by Bhanot and Kennedy |2| that an unbiased estimator (e AH ) can be 
constructed via a series expansion of the exponential in terms of the powers 
of independent unbiased estimator (AH). One can reduce the variance of the 
estimated acceptance probability by considering the variants of this series ex- 
pansion. This will help reduce the probability of both the lower-bound and 
upper-bound violations. We tried several variants, the best turns out to be 

(e AH ) ee ITf =1 e% (3) 

where x\,X2, ....,Xn are identical, independent unbiased estimators of and 
each e x is estimated by the series expansion developed by Bhanot and Kennedy [0 : 

(e«> = 1 + x x {l + i* a (l + i* 8 (l+....))) (4) 

where the coefficients in the Taylor expansion are interpreted as probabilities. 
The procedure goes as follows. First, one sets (e x ) = 1 + x\. Then one adds 
x\X2 to (e x ) with probability ~; otherwise one stops. If it is not stopped, one 
then continues to add to (e x ) with probability |, and so on. It is easy 

to prove that the above estimator is unbiased. One can also calculate its 
variance which is 



/, A Win r AH 2 +S 2 AH AH 2 +S 2 AH / N l/v ,T77 

Var((e AH )) = {e^^~ +2fe~ -e~j^~) =b> |7v_ e 2AH 

AH/N -{AH +5 2 )/N 2 

(5) 

where 5 2 = AH 2 - AH is the variance of AH from the noise estimate. It is 
smaller than those of the other variants of the series expansion we considered 
such as e N ^i=i Xi and J2i=i eXi - It can be shown that only a finite number of 
terms are needed in actual calculations. 



Although one can improve the performance of the linear algorithm with the above 
techniques, there are still problems inherent to the algorithm which are impossible to 
eradicate. First of all, if we assume that the estimator of the acceptance probability 
has a Gaussian distribution, then the long tails of the Gaussian distribution always 
exist. As a result, the probability bound violations will never be completely excluded. 
Secondly, the linear algorithm with a stochastic estimator is a volume-squared algo- 
rithm. Thus, in realistic simulations of problems such as lattice QCD with dynamical 
fermions, it would be very costly to estimate the fermion determinant with sufficient 
accuracy in order to put the probability bound violations under control. The volume 
dependence can be seen from the following consideration. Suppose we take the best 
estimator in Eq. (|3]) to calculate the acceptance probability, the variance is primarily 
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a function of 5 2 /N when AH ~ 0, according to Eq. (|5]). If in addition 5 2 /N is small, 
one can do an expansion and find that the variance goes like S 2 /N. Usually 5 2 is 
proportional to the volume or the size of the problem. Therefore, if one wants to 
keep the bound violations the same as the volume grows, one needs to have a larger 
N. Consequently, N grows as the volume V and the cost will be proportional to V 2 
since the cost of the stochastic estimator itself is usually proportional to V, e.g. for 
a sparse matrix. 

In order to completely remove any systematic bias coming from probability bound 
violations and to reduce the cost of simulation on large volumes, one needs to go be- 
yond the linear algorithm. In this letter, we propose a new algorithm which will 
achieve these goals. We shall see that the new algorithm eliminates the upper bound 
violation and absorbs the negative sign of the lower bound violation into the observ- 
able. These are achieved by introducing auxiliary variables and going back to the 
Metropolis accept/reject criterion. 



2 A Noisy Monte Carlo Algorithm 

Let us consider a model with Hamiltonian H(U) where U collectively denotes the 
dynamical variables of the system. The major ingredient of the new approach is to 
transform the noise for the stochastic estimator into stochastic variables. To this end, 
the stochastic series expansion in Eq. (Q) is written in terms of an integral of the 
stochastic variables £. 

e- H(U) = f[Dt]Pt®mt), (6) 

where f(U,£) is an unbiased estimator of e~ H ^ u ' from the stochastic variable £ and 
P^ is the probability distribution for £. Here, we use £ as a collective symbol for all 
the stochastic variables. 

Given this integral in Eq. @, the partion function of the model can be written 

as 

Z = J[DV]e- H W 

= J[DU)[DZ)Ps(Of(U,0. (7) 

Originally, we have a configuration space of U. Now it is enlarged to (U, £) with the 
inclusion of the stochastic variable £. From now on, we shall specify a configuration 
or state in this enlarged space. 

The next step is to address the lower probability-bound violation. One first ob- 
serves that 

/([/,£) = sign(/)|/(f/,£)| • (8) 
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Since sign(/), the sign of /, is a state function, we can write the expectation value of 
the observable O as 

(O) = J [DU)[Dt)Ps(0O(U)sign(f) \f(U,0\/Z. (9) 

After redefining the partition function to be 

Z = f[DU][D$P&)\mt)\, (10) 

which is semi-positive definite, the expectation of O in Eq. (^]) can be rewritten as 

(O) = (0(U) sign(/))/(sign(/)). (11) 

As we see, the sign of f(U,£) is not a part of the probability any more but a part in the 
observable. Notice that this reinterpretation is possible because the sign of f(U,£) is 
a state function which depends on the configuration of U and £. We note that in the 
earlier linear accept/reject case in Eq. ([!]), the acceptance criterion depends on the 
transition probability (e AH ) = (e H ( Ul >~ H ( U2 ') which cannot be factorized into a ratio 
of state functions such as {e H<yUl) ) / (e H( - u ^). Consequently, the sign of the acceptance 
probability in the linear algorithm |l] cannot be swept into the observable as in Eq. 

(0). 

It is clear then, to avoid the problem of lower probability-bound violation, the 
accept /reject criterion has to be factorizable into a ratio of the new and old proba- 
bilities so that the sign of the estimated f(U,£) can be absorbed into the observable. 
This leads us back to the Metropolis accept/reject criterion which incidentally cures 
the problem of upper probability-bound violation at the same time. It turns out two 
accept/reject steps are needed in general. The first one is to propose updating of U 
via some procedure while keeping the stochastic variables £ fixed. The acceptance 
probability P a is 

Pa(Uu £ - U 2 , = min(l, • (12) 

The second accept/reject step involves the refreshing of the stochastic variables £ 
according to the probability distribution P^(0 while keeping U fixed. The acceptance 
probability is 

P a (U, 6 - U, 6) = min(l, • (13) 

It is obvious that there is neither lower nor upper probability-bound violation in 
either of these two Metropolis accept/reject steps. Furthermore, it involves the ratios 
of separate state functions so that the sign of the stochastically estimated probability 
f(U,£) can be absorbed into the observable as in Eq. QTTj) . 

Detailed balance can be proven to be satisfied. For the first step which involves the 
updating U\ — > U 2 with £ = fixed, one can show for the case 1/(^2,01/1/(^1)01 < 1 

PeaiUuQPciU! - U 2 )P a (U 1 ,^ ^2,0 
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- P eq (U 2 ,0P c (U 2 ^U 1 )P a (U 2 ,^U l ,,0 

= m) \m,0\Pe(Ui^ u 2 ) 

- n(0 1/(^2,01 Pc{u 2 -i/i) 
= p e (0l/(c/ 2 ,0l^(c/i^c/ 2 ) 

- i^(0 1/(^2,01 ^2 - C/i) = 0. (14) 

where P eq is the equilibrium distribution and P c is the probability of choosing a 
candidate phase space configuration satisfying the reversibility condition 

Pc{Ux - C/ 2 ) = P c (t/ 2 - C/i) . (15) 

Detailed balance for the second step which invokes the updating 6 — > £ 2 with [/ fixed 
can be similarly proved. For the case |/(C/ 2 , 01 < 1> we have 

P eg (U, 6) P c (6 - 6) 6 - f/, 6) 
= ^(6)1/(^,6)1^(6) 

= 0. (16) 

Therefore, this new algorithm does preserve detailed balance and is completely unbi- 
ased. 

We have tested this noisy Monte Carlo (NMC) on a 5-state model which is the 
same used in the linear algorithm [Q] for demonstration. Here, P c (Ui — > U 2 ) = | and 
we use Gaussian noise to mimic the effects of the noise in the linear algorithm and 
the stochastic variables £ in NMC. We calculate the average energy with the linear 
algorithm and the NMC. Some data are presented in Table 1. Each data point is 
obtained with a sample of one million configurations. The exact value for the average 
energy is 0.180086. 

We first note that as long as the variance of the noise is less than or equal to 0.06, 
the statistical errors of both the NMC and linear algorithm stay the same and the 
results are correct within two a. To the extend that the majority of the numerical 
effort in a model is spent in the stochastic estimation of the probability, this admits 
the possibility of a good deal of saving through the reduction of the number of noise 
configurations, since a poorer estimate of the probability with less accuracy works just 
as well. As the variance becomes larger than 0.06, the systematic bias of the linear 
algorithm sets in and eventually becomes intolerable, while there is no systematic 
bias in the NMC results. In fact, we observe that the NMC result is correct even 
when the percentage of negative probability reaches as high as 44%, although the 
statistical fluctuation becomes larger due to the fact that the negative sign appears 
more frequently. We should remark that the Metropolis acceptance rate is about 92% 
for the smallest noise variance. It decreases to 85% when the varicance is 0.1 and 
it drops eventually to 78% for the largest variance 50.0. Thus, there is no serious 
degrading in the acceptance rate when the variance of the noise increases. 
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Table 1: Data for the average energy obtained by NMC and the linear algorithm [|TJ. 
They are obtained with a sample size of one million configurations each. Var is the 
variance of the noise estimator for e~ H in NMC and e AH in the linear algorithm, a 
in Eq. (|) is set to 1.0 in the latter case. Negative Sign denotes the percentage of 
times when the sign of the estimated probability is negative in NMC. Low/High Vio. 
denotes the percentage of times when the low/high probability-bound is violated in 
the linear algorithm. The exact average energy is 0.180086. 



Var 


NMC 


Negative Sign 


Linear 


Low Vio. 


High Vio. 


0.001 


0.17994(14) 


0% 


0.18024(14) 


0% 


0% 


0.002 


0.18016(14) 


0% 


0.17994(14) 


0% 


0% 


0.005 


0.18017(14) 


0% 


0.17985(14) 


0% 


0% 


0.008 


0.17993(14) 


0% 


0.17997(14) 


0% 


0% 


0.01 


0.18008(14) 


0% 


0.17991(14) 


0% 


0% 


0.06 


0.17992(14) 


0.008% 


0.17984(14) 


0.001% 


0.007% 


0.1 


0.17989(14) 


0.1% 


0.17964(14) 


0.1% 


0.3% 


0.2 


0.18015(15) 


1.6% 


0.18110(13) 


1% 


1% 


0.5 


0.1800(3) 


5% 


0.1829(1) 


3% 


4% 


1.0 


0.1798(4) 


12% 


0.1860(1) 


6% 


7% 


5.0 


0.1795(6) 


28% 


0.1931(1) 


13% 


13% 


6.5 


0.1801(5) 


30% 


0.1933(1) 


13% 


14% 


10.0 


0.1799(9) 


38% 








15.0 


0.1798(9) 


38% 








20.0 


0.1803(11) 


39% 








30.0 


0.1800(13) 


41% 








50.0 


0.1794(17) 


44% 









We further observe that the variance of the NMC result does not grow as fast as 
the variance of the noise. For example, the variance of the noise changes by a factor of 
833 from 0.06, where the probability-bound violation starts to show up in the linear 
algorithm, to 50.0. But the variance of the NMC result is only increased by a factor 
of (0. 0017/0. 00014) 2 = 147. Thus, if one wants to use the linear algorithm to reach 
the same result as that of NMC and restricts to configurations without probability- 
bound violations, it would need 833 times the noise configurations to perform the 
stochastic estimation in order to bring the noise variance from 50.0 down to 0.06 but 
147 times less statistics in the Monte Carlo sample. In the case where the majority 
of the computer time is consumed in the stochastic estimation, it appears that NMC 
can be more economical than the linear algorithm. 



6 



3 Lattice QCD with Fermion Determinant 



One immediate application of NMC is lattice QCD with dynamical fermions. The 
action is composed of two parts - the pure gauge action S g (U) and a fermion action 
Sf{U) = —Tr In M(U). Both are functionals of the gauge link variables U. Con- 
sidering the Hybrid Monte Carlo || approach with explicit Tr In M for the fermion 
action, we first enlarge the phase space from (U) to (U,p) where p denotes the con- 
jugate momentum of U. The partion function is 

Z = J[DU] [Dp\e- H{ P*> (17) 

where H(U,p) = p 2 /2 + S g (U) + Sp(U). To apply NMC, we introduce stocahstic 
variables to estimate the fermion determinant 

detM(U) = e™ 1 = J [DQPfc) f(U,{) (18) 

where f(U,£) will be given later. The partition function is then 

Z = J [DU)[Dp)[D$Ps(0 e' H ^ f(U,0, (19) 

where H G = p 2 /2 + S g . In the Hybrid Monte Carlo, the configuration (U,p) is up- 
dated with molecular dynamics. In this case, the probability of choosing a candidate 
configuration is P c (Ui, p x — > U 2 , p 2 ) = S[(U 2 , p 2 ) - (Ui(r), Px(r))\ where U x {t) and 
Px{t) are the evolved values at the end of the molecular dynamics trajectory after r 
steps. Using the reversibility condition 

P C (U U p x -> U 2 , p 2 ) = P C (U 2 , -p 2 -> U u - Pl ), (20) 

one can again prove detailed balance with two corresponding Metropolis steps as in 
Eqs. (H) and 

To find out the explicit form of f(U,£), we note that the fermion determinant can 
be calculated stochastically as a random walk process 

tv in A/f m , , Tr In M , Tr In M . ... , . 

e TrlnM = y + THnM(1 + ___ (1 + ___(...))) j (2 1) 

as described in Eq. (HI). This can be expressed in the following integral 



e Tr In M 



/oo „\ oo 

i=l J0 n=2 



[1 +r ! \\nMr !1 (l + e(p 2 - ^| In Mr/ 2 (1 + % 3 - ^)r/l lnM % (-], (22) 

where P v {i]i) is the probability distribution for the stochastic variable rji. It can be 
the Gaussian noise or the Z 2 noise (P v (r]i) = 5(\r]i\ — 1) in this case). The latter 
is preferred since it has the minimum variance pL p n is a stochastic variable with 
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uniform distribution between and 1. This sequence terminates stochastically in 
finite time and only the seeds from the pseudo-random number generator need to be 
stored in practice. Comparing this to Eq. (p~8|) , the function f(U,r),p) (Note the £ 
in Eq. (|18|) is represented by two stochastic variables rj and p here) is represented by 
the part of the integrand between the the square brackets in Eq. (p2|). One can then 
use the efficient Pade-Z 2 algorithm |J to calculate the 77jlnM77j in Eq. (0). All the 
techniques for reducing the variance of the estimator without bias developed before || 
can be applied here. It is learned that after the unbiased subtraction, the error on the 
stochastic estimation of the Tr In M difference between two fermion matrices at the 
beginning and end of the molecular dynamics trajectory for an 8 3 x 12 lattice with 
Wilson fermion at /3 — 6.0 and k = 0.154 can be reduced from 12.0 down to 0.49 with 
a mere 100 noise configurations ||. This implies a 49% error on the determinant ratio 
and makes the application of the noisy Monte Carlo algorithm rather promising. 

Finally, there is a practical concern that Tr In M can be large so that it takes 
a large statistics to have a reliable estimate of e TrlnM from the series expansion in 
Eq. fl2"2"|). In general, for the Taylor expansion e x = ^x n /n\, the series will start to 
converge when x n /n\ > x n+l /(n + 1)!. This happens at n = x. For the case x = 100, 
this implies that one needs to have more than 100! stochastic configurations in the 
Monte Carlo integration in Eq. (p2[) in order to have a convergent estimate. Even 
then, the error bar will be very large. To avoid this difficulty, one can implement the 
following strategy. First one note that since the Metropolis accept/reject involves the 
ratio of exponentials, one can subtract a universal number xq from the exponent x 
in the Taylor expansion without affecting the ratio. Second one can use the trick in 
Eq. (|3|) to diminish the value of the exponent. In other words, one can replace e x 
with ( e ^- x °y N ) N to satisfy | x — Xq\/N < 1. The best choice for xq is x, the mean 
of x. In this case, the variance in Eq. (|5[) becomes e s / N — 1. Comparing with Eq. 
(|5p, one can verify that it is smaller than the case without x subtraction by e 2x . We 
should mention that this is not an issue in the Kennedy-Kuti algorithm where the 
accept/reject criterion involves the transition probability e AH = e H ( Ul >~ H ( u ^ not the 
ratio of probabilities as in the Metropolis criterion. 



4 Summary and Discussion 

In summary, the new noisy Monte Carlo algorithm proposed here is free from the 
problem of probability-bound violations which afflicts the linear accept/reject algo- 
rithm, especially when the variance of the noise is large. The upper-bound violation 
is avoided by going back to the Metropolis accept /reject. The lower-bound violation 
problem is tackled by grouping the sign of the estimated probability with the ob- 
servable. With the probability-bound violation problem solved, NMC is a bona fide 
unbiased stochastic algorithm as demonstrated in the 5-state model. Furthermore, 
it is shown in the 5-state model that it is not necessary to have an extremely small 
variance in the stochastic estimation. With the encouraging results from the Pade-Z2 
estimation of the Tr InM ||, one has a reasonable hope that the V 2 dependence of 
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NMC will be tamed with a smaller prefactor. We will apply NMC to the dynamical 
fermion updating in QCD and compare it to the HMC with pseudo-fermions 0. 
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